PEL: Position-Enhanced Length Filter for Set Similarity Joins

نویسندگان

  • Willi Mann
  • Nikolaus Augsten
چکیده

Set similarity joins compute all pairs of similar sets from two collections of sets. Set similarity joins are typically implemented in a filter-verify framework: a filter generates candidate pairs, possibly including false positives, which must be verified to produce the final join result. Good filters produce a small number of false positives, while they reduce the time they spend on hopeless candidates. The best known algorithms generate candidates using the so-called prefix filter in conjunction with lengthand position-based filters. In this paper we show that the potential of length and position have only partially been leveraged. We propose a new filter, the position-enhanced length filter, which exploits the matching position to incrementally tighten the length filter; our filter identifies hopeless candidates and avoids processing them. The filter is very efficient, requires no change in the data structures of most prefix filter algorithms, and is particularly effective for foreign joins, i.e., joins between two different collections of sets.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

An Empirical Evaluation of Set Similarity Join Techniques

Set similarity joins compute all pairs of similar sets from two collections of sets. We conduct extensive experiments on seven state-of-the-art algorithms for set similarity joins. These algorithms adopt a filter-verification approach. Our analysis shows that verification has not received enough attention in previous works. In practice, efficient verification inspects only a small, constant num...

متن کامل

Fixed-point FPGA Implementation of a Kalman Filter for Range and Velocity Estimation of Moving Targets

Tracking filters are extensively used within object tracking systems in order to provide consecutive smooth estimations of position and velocity of the object with minimum error. Namely, Kalman filter and its numerous variants are widely known as simple yet effective linear tracking filters in many diverse applications. In this paper, an effective method is proposed for designing and implementa...

متن کامل

3D motion vector coding with block base adaptive interpolation filter on H.264

Fractional pel motion compensation generally improves coding efficiency due to more precise motion accuracy and low path filtering effect in generating image at fractional pel positions. In H.264, quarter pel motion compensation is applied, where image at half pel position is generated by 6 tap Wiener filter. And the adaptive interpolation filter technique, which adaptively changes filter chara...

متن کامل

Bitmap Filter: Speeding up Exact Set Similarity Joins with Bitwise Operations

The Exact Set Similarity Join problem aims to find all similar sets between two collections of sets, with respect to a threshold and a similarity function such as overlap, Jaccard, dice or cosine. The näıve approach verifies all pairs of sets and it is often considered impractical due the high number of combinations. So, Exact Set Similarity Join algorithms are usually based on the Filter-Verif...

متن کامل

Similarity Joins in Relational Database Systems

State-of-the-art database systems manage and process a variety of complex objects, including strings and trees. For such objects equality comparisons are often not meaningful and must be replaced by similarity comparisons. is book describes the concepts and techniques to incorporate similarity into database systems. We start out by discussing the properties of strings and trees, and identify t...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2014